HappyDB is a corpus of 100,000 crowd-sourced happy moments via Amazon’s Mechanical Turk. You can read more about it on https://arxiv.org/abs/1801.07746.
Here, we explore this data set and try to answer the question, “What makes people happy?”
From the packages’ descriptions:
tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures;tidytext allows text mining using ‘dplyr’, ‘ggplot2’, and other tidy tools;DT provides an R interface to the JavaScript library DataTables;scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends;wordcloud2 provides an HTML5 interface to wordcloud for data visualization;gridExtra contains miscellaneous functions for “grid” graphics;ngram is for constructing n-grams (âtokenizingâ), as well as generating new text based on the n-gram structure of a given text input (âbabblingâ);Shiny is an R package that makes it easy to build interactive web apps straight from R;library(tidyverse)
library(tidytext)
library(DT)
library(scales)
library(wordcloud2)
library(gridExtra)
library(ngram)
library(shiny)
library(ggraph)
library(igraph)
We use the processed data for our analysis and combine it with the demographic information available.
hm_data <- read_csv("../output/processed_moments.csv")
urlfile<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/demographic.csv'
demo_data <- read_csv(urlfile)
We select a subset of the data that satisfies specific row conditions.
hm_data <- hm_data %>%
inner_join(demo_data, by = "wid") %>%
select(wid,
original_hm,
gender,
marital,
parenthood,
reflection_period,
age,
country,
ground_truth_category,
text) %>%
mutate(count = sapply(hm_data$text, wordcount)) %>%
filter(gender %in% c("m", "f")) %>%
filter(marital %in% c("single", "married")) %>%
filter(parenthood %in% c("n", "y")) %>%
filter(reflection_period %in% c("24h", "3m")) %>%
mutate(reflection_period = fct_recode(reflection_period,
months_3 = "3m", hours_24 = "24h"))
datatable(hm_data)
It seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.htmlIt seems your data is too big for client-side DataTables. You may consider server-side processing: https://rstudio.github.io/DT/server.html